Similarity search of time-warped subsequences via a suffix tree

نویسندگان

  • Sanghyun Park
  • Wesley W. Chu
  • Jeehee Yoon
  • Jung-Im Won
چکیده

This paper proposes an indexing technique for fast retrieval of similar subsequences using the time warping distance. The time warping distance is a more suitable similarity measure than the Euclidean distance in many applications where sequences may be of different lengths and/or different sampling rates. The proposed indexing technique employs a disk-based suffix tree as an index structure and uses lower-bound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and hence accelerate the query processing, it converts sequences in the continuous domain into sequences in the discrete domain and stores only a subset of the suffixes whose first values are different from those of the immediately preceding suffixes. Extensive experiments with real and synthetic data sequences revealed that the proposed approach significantly outperforms the sequential scan and LB scan approaches and scales well in a large volume of sequence databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity-Based Subsequence Search in Image Sequence Databases

This paper proposes an indexing technique for fast retrieval of similar image subsequences using the multi-dimensional time warping distance. The time warping distance is a more suitable similarity measure as compared to the Lp distance in many applications where sequences may be of different lengths and/or different sampling rates. Our indexing scheme employs a disk-based suffix tree as an ind...

متن کامل

Suffix Tree of Alignment: An Efficient Index for Similar Data

We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings A and B is a compacted trie representing all suffixes in A and B. It has |A|+ |B| leaves and can be constructed in O(|A|+ |B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not ex...

متن کامل

Semantic Suffix Tree Clustering

This paper proposes a new algorithm, called Semantic Suffix Tree Clustering (SSTC), to cluster web search results containing semantic similarities. The distinctive methodology of the SSTC algorithm is that it simultaneously constructs the semantic suffix tree through an on-depth and on-breadth pass by using semantic similarity and string matching. The semantic similarity is derived from the Wor...

متن کامل

Improving Web Search Results Using Semantic Clustering

This paper consider the problem of search engine that are not capable of retrieving appropriate result on query given. Most of the users are not able to give the appropriate query to get what exactly they wanted to retrieve. So the search engine retrieves a massive list of data, which are ranked by the page rank algorithm or relevancy algorithm or human judgment algorithm. If the relevant resul...

متن کامل

Shape-based retrieval in time-series databases

The shape-based retrieval is defined as the operation that searches for the (sub)sequences whose shapes are similar to that of a query sequence regardless of their actual element values. In this paper, we propose a similarity model suitable for shape-based retrieval and present an indexing method for supporting the similarity model. The proposed similarity model enables to retrieve similar shap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Syst.

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2003